Ontological Queries: Rewriting and Optimization 

(Extended Version)* 



o 



(N 



X 



Georg Gottlob^'^, Giorgio Orsi^'^, Andreas Pieris^ 

^Department of Computer Science, University of Oxford, UK 
^Oxford-Man Institute of Quantitative Finance, University of Oxford, UK 
^Institute for the Future of Computing, University of Oxford, UK 



■ {georg. gottlob,giorgio . orsi , andreas .pier is}@cs . ox . ac .uk 



Abstract 

Ontological queries are evaluated against an ontology rather than directly on a database. 
C/2 , The evaluation and optimization of such queries is an intriguing new problem for database re- 

search. In this paper we discuss two important aspects of this problem: query rewriting and 
query optimization. Query rewriting consists of the compilation of an ontological query into 
an equivalent query against the underlying relational database. The focus here is on soundness 
^ , and completeness. We review previous results and present a new rewriting algorithm for rather 

' general types of ontological constraints. In particular, we show how a conjunctive query against 

^■f-^ , an ontology can be compiled into a union of conjunctive queries against the underlying database. 

' Ontological query optimization, in this context, attempts to improve this process so to produce 

possibly small and cost-effective UCQ rewritings for an input query. We review existing opti- 
mization methods, and propose an effective new method that works for linear Datalog^ , a class 
of Datalog-based rules that encompasses well-known description logics of the DL-Lite family. 



1 Introduction 

This paper is about ontological query processing, an important new challenge to database research. 
We will review existing methods and propose new algorithms for compiling an ontological query, 
that is, a query against an ontology on top of a relational database, into a direct query against 
this database, and we will deal with optimization issues related to this process so as to obtain 
possibly small and efficient compiled queries. In this section, we first discuss a number of relevant 
concepts, and then illustrate query rewriting and optimization processes in the context of a small 
but non-trivial example. 

Ontologies. The use of ontologies and ontological reasoning in companies, governmental orga- 
nizations, and other enterprises has become widespread in recent years. An ontology is an explicit 
specification of a conceptualization of an area of interest [2] , and consists of a formal representation 
of knowledge as a set of concepts within a domain, and the relationships between those concepts [3]. 
To distinguish an enterprise ontology from a data dictionary, Dave McComb explicitly refers to 
the formal semantics of ontologies that enables automated processing and inferencing, while the 
interpretation of a data dictionary is strictly done by humans [1]. Moreover, ontologies have been 
adopted as high-level conceptual descriptions of the data contained in data repositories that are 
sometimes distributed and heterogeneous in the data models. Due to their high expressive power, 
ontologies are also substituting more traditional conceptual models such as UML class-diagrams 
and E/R schemata. 



*This is an extended and revised version of the paper [T]. 



Description Logics. Description logics (DLs) are logical languages for expressing and mod- 
elling ontologies. The best known DLs are those underlying the OWL language^. The main 
ontological reasoning and query answering tasks in the complete OWL language, called OWL Full, 
are undecidable. For the most well-known decidable fragments of OWL, ontological reasoning and 
query answering is still computationally very hard, typically 2EXPTlME-complete. 

In description logics, the ontological axioms are usually divided into two sets: The ABox (as- 
sertional box), which essentially contains factual knowledge such as "IBM is a company", denoted 
by company (ibm) , or "IBM is listed on the NASDAQ", which could be represented as a fact of 
the form list -comp (ibm, nasdaq), and a TBox (terminological box) which contains axioms and 
constraints that allow us, on the one hand, to infer new facts from those given in the ABox, 
and, on the other hand, to express restrictions such as keys. For example, a TBox may con- 
tain an axiom stating that for each fact list-Comp{X,Y), Y must be a financial index, which in 
DL is expressed as 3list_comp~ C finjdx. If the fact fin Jdx (nasdaq) is not already present in 
the ABox, it can be derived via the above axiom from list _comp (ibm, nasdaq). Thus, the atomic 
query ^^q{X) ^ fin_idx{X)" would return nasdaq as one of the answers. Note that the axiom 
3list-Comp~ C fin_idx, which corresponds to an inclusion dependency, is enforced by adding new 
tuples, rather than just being checked. This is one main difference between ontological constraints 
and classical database dependencies. In database terms, the above axiom is to be interpreted more 
like a trigger than a classical constraint. 

Ontology Based Data Access (OBDA). We are currently witnessing the marriage of on- 
tological reasoning and database technology. In fact, this amalgamation consists in the realization 
of the obvious idea that ABoxes shall be implemented in form of a relational database, or even 
stored in classical RDBMSs. Moreover, very large existing databases are semantically enriched 
with ontological constraints. There are a number of recent commercial systems and experimental 
prototypes that extend RDBMSs with the possibility of querying an ontology that is rooted in a 
database (for examples, see Section [2]). The main problem here is how to couple these two different 
types of technology smoothly and efficiently, and this is also the main theme of the present paper. 

One severe obstacle to efficient OBDA is the already mentioned high computational complexity 
of query answering with description logics. The situation clearly worsens when the ABoxes of 
enterprise ontologies are very large databases. To tackle this problem, new, lightweight DLs have 
been designed, that guarantee polynomial-time data complexity for conjunctive query answering. 
This means that based on a fixed TBox, a fixed query can be answered in polynomial time over 
variable databases. The best-known and best-studied examples of such lightweight DLs are the 
DL-Lite [5] and £h (see, e.g., [6]) families. These languages can be considered tractable subclasses 
of OWL. It was convincingly argued that simple DLs such as DL-Lite or fL are sufficient for 
modelling an overwhelming number of applications. 

More recently, the Datalog^ family of description logics was introduced [3 El [U [TOj . Its syntax 
is based on classical first-order logic, more specifically, on variants of the well-known Datalog 
language [11]. The basic Datalog^ rules are known as tuple- generating dependencies (TGDs) in the 
database literature [12] . Tractable DLs in this framework are guarded Datalog^ , which is noticeably 
more general than both DL-Lite and fL, and the DLs linear Datalog^ and sticky-join Datalog^, 
which both encompass DL-Lite. 

Besides being more expressive than DL-Lite, suitable Datalog^ languages offer a more compact 
representation of the attributes of concepts and roles, since description logics are usually restricted 
to unary and binary predicates only. Consider, as an example, a relation stock^id, name, unit-price). 
Representing this relation in DL would require the introduction of a concept symbol stock, and of 
three attribute symbols id, name and unit-price. These entities must be then weaved together by 
the TBox formula stock C 3idr\3name r\3unit-price . Datalog^ represents the relation in a natural 
way by means of a ternary predicate stock. In the same way, Datalog^ provides a more natural 
syntax for many other DL formulae; for example, an inverse role assertion r Q s" is represented 
as a (full) TGD r(X,Y) — )• s{Y,X), while an existential restriction p C 3r.q is represented as a 
(partial) TGD p{X) 3Y r{X,Y),q{Y). 
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First-Order Rewritability. Polynomial-time tractability is often considered not to be good 
enough for efficient query processing. Ideally, one would like to achieve the same complexity as 
for processing SQL queries, or, equivalently, first-order (FO) queries. An ontology language L is 
first- order rewritable if, for every TBox S expressed in C and a query q, a first-order query q^; 
(called the perfect rewriting) can be constructed such that, given a database D, evaluated over 
D yields exactly the same result as q evaluated against D and S. Since answering first-order queries 
is in the class ACq in data complexity [13], it immediately follows that under FO-rewritable TGDs, 
query answering is also in ACq in data complexity 

This notion was first introduced by Calvanese et al. [5] in the concept of description logics. If a 
DL guarantees the FO-rewritability of each query under every TBox, we simply say that the logic 
is FO-rewritable. FO-rewritability is a most desirable property since it ensures that the reasoning 
process can be largely decoupled from data access. In fact, to answer query q, a separate software 
can compile q into q^,, and then just submit q^, as a standard SQL query to the DBMS holding D, 
where it is evaluated and optimized in the usual way. 

Excitingly, it was shown that the members of the DL-Lite family, as well as the slightly more 
expressive language linear Datalog^ are FO-rewritable. Moreover, even the much more expressive 
language of sticky-join Datalog^ is FO-rewritable. For these languages, a pair where g is a 

CQ, is rewritten as an SQL expression equivalent to a UCQ qs. The research challenge we address 
in this paper is precisely the question of how to rewrite to qs correctly and efficiently. Let 

us illustrate this process by a small, but comprehensive example. 

Consider the following relational schema TZ representing financial information about companies 
and their stocks: 

stock{\d, name, unit-price) 
company (name, country, segment) 
list-Comp{stock, list) 
finJdx{name, type, ref-mkt) 
stock _portf {company , stock, qty). 

The stock relation contains information about stocks such as the name, and the price per unit. 
The relation company contains information about companies; in particular, the name, the country, 
and the market segment of a company. The relation list_comp relates a stock to a financial index 
(i.e., NASDAQ, FTSE, NIKKEI) represented by the relation finjidx which, in turn, contains 
information about the types of stocks in the index, and the reference market (e.g., London Stock 
Exchange). Finally, stock-portf relates companies to their stocks along with an indication of the 
amount of the investment. 

Datalog^ provides the necessary expressive power to extend TZ with ontological constraints in 
an easy and intuitive way. Examples of such constraints follow: 

(Ji : stock_portf{X, Y, Z) 3V3W company{X, V, W) 

0-2 : stock. portf lx, Y, Z) 3V3W stock {Y , V, W) 

(73 : list_comp{X, Y) ^ 3Z3W finJdx{ Y, Z, W) 

(74 : list_comp{X, Y) 3Z3W stock{X, Z, W) 

(75 : stock. portf {X ^Y , Z) ^ has. stock{ Y , X) 

(76 : has.stock{X, Y) — > 3Z stock .portf { Y , X , Z) 

(77 : stock{X, Y, Z) 3V3W stock.portf{ V, X, W) 

(78 : stock{X , Y , Z) ^ fin.ins(X) 

(7g : company{X , Y , Z) legal. person{X) 

5i : legal. per son{X , Y , Z), fin.ins{X , V, W) — _L. 

The first four TGDs set the "domain" and the "range" of the stock.portf and list.comp relations, 
respectively. TGDs as and erg assert that stock.portf and has.stock are "inverse relations", while (Ty 
expresses that each stock must belong to some stock portfolio. TGDs as and cxg model taxonomic 
relationships such as the facts that each stock is a financial instrument, and each company is a 
legal person. Finally, the negative constraint 5i (where ± denotes the truth constant false) states 
that legal persons and financial instruments are disjoint sets. 



Figure 1: A (partial) rewriting for the Stock Exchange example. 

B, C) ^ finJns{A), stock _portf [B , A, D), company {B, E, F), list_comp{A, C),finJdx{C, G, H) 
{A, B, C) <— fin_ins{A), has_stock{A, B), company [B^ E, F), list_comp{A, C),fin_idx{C, G, H) 

B, G) ^ finJns{A), has_stock{A, B), stock .portf {B , E, F), list _comp{A, G), fin_idx{G, G, H) 
[A, B, G) stock{A, J, K), has_stock{A, B), stock _portf {B, E, F), list_comp{A, G),finJdx{G, G, H) 



Consider now the following conjunctive query q asking for all the triples {a,b,c), where a is a 
financial instrument owned by the company b and listed on c: 

q{A,B,C) ^ fin_ins{A), stock_portf{B, A, D), company {B , E, F), 
list_comp{A, C),fin_idx{C, G, H). 

Since S = {cji, . . . , erg} is a set of linear TGDs, i.e., TGDs with single body-atom, query answering 
under S is FO-rewritable. Thus, it is possible to reformulate (S, q) to a first-order query q^ such 
that, for every database D, DLiT,\=qiED\= q^,. A naive rewriting procedure would use the 
TGDs of S as rewriting rules for the atoms in q to generate all the CQs of the perfect rewriting. 
Figure [T] shows a (partial) rewriting for q, where the query obtained at the i-th step is denoted as 
gW, and the newly introduced atoms are underlined. In particular, is the given query q, q^^^ is 
obtained from by using ctq, gt^l is obtained from gl^l by applying cji, and gt^l is obtained from 
by using cjg. 

The complete perfect rewriting contains more than 200 queries executing more than 1000 joins. 
However, by exploiting the set of constraints, it is possible to eliminate redundant atoms in the 
generated queries, and thus reduce the number of the CQs in the rewritten query. For exam- 
ple, in the given query q above it is possible to eliminate the atom fin_ins{A) since, due to the 
existence of the TGDs a2 and erg in S, if the atom stock_portf(B,A,D) is satisfied, then imme- 
diately the atom fin_ins{A) is also satisfied. Notice that by eliminating a redundant atom from 
a query, we also eliminate all the atoms that are generated starting from it during the rewriting 
process. Moreover, due to the TGD (T3, if the atom list-Comp{A, C) in q is satisfied, then the 
atom fin_idx{C , G, H) is also satisfied, and therefore can be eliminated. Finally, due to the TGD 
o"!, if the atom stock -portf {B , A, D) is satisfied, then the atom company (B , E , F) is also satisfied, 
and hence is redundant. The query that has to be considered as input of the rewriting process 
is therefore q{A,B,G) ^ stock-portf{B,A,D),list-Comp{A,G) that produces a perfect rewriting 
containing the following two queries executing only two joins: 

q{A,B,G) -h- list_comp{A,C),stock_portf{B,A,D) 
q{A,B,G) list_comp{A,C), has-stock{A, B). 

Contributions and Roadmap. After a review of previous work on ontology based data 
access in the next section, and some formal definitions and preliminaries in Section [Sj we present a 
short overview of the Datalog^ family in Section [H We then proceed with new research results. In 
SectionlSl we propose a new rewriting algorithm that improves the one stated in [14] by substantially 
reducing the number of redundant queries in the perfect rewriting. In Section [6l we present a 
polynomial-time optimization strategy based on the early-pruning of redundant atoms produced 
during the rewriting process. An implementation and experimental evaluation of the new method 
is discussed in Section [71 We also discuss the relationship between our optimization technique and 
optimal query minimization algorithms such as the chase & back-chase algorithm [15]. We conclude 
with a brief outlook on further research. 

2 Ontology Based Data Access 

Answering queries under constraints and the related optimization techniques are important topics 
in data management beyond the obvious research interest. In particular, they are profitable op- 
portunities for companies that need to deliver efficient and effective data management solutions to 



their customers. This trend is becoming even more evident as a plethora of robust systems and 
APIs for Semantic Web data management proposed in the recent years. These systems span from 
open-source solutions such as Virtuoso^, Sesame^, RDFSuite [TB], KAON^ and Jena^, to commer- 
cial implementations such as the semantic extensions implemented in Oracle Database llg R2 |17] 
and BigOWLLim^. In this Section we briefly analyze the systems providing rewriting-based access 
to databases under ontological constraints, and we highlight some crucial points that we want to 
address in this work. 

We first present the class of constraints identified by the members of the DL-Lite family [5], 
namely, DL-Lite^, DL-Litejr, and DL-Lite-;^, underlying the W3C OWL-QL profile of the OWL 
language. These constraints correspond to unary and binary inclusion dependencies combined 
with a restricted form of key constraints. In order to perform query answering under this class 
of constraints, a rewriting algorithm, introduced in [5] and implemented in the QuOnto system, 
reformulates the given query into unions of conjunctive queries. The size of the reformulated 
query is unnecessarily large due to a number of reasons. In the first place, (i) basic optimization 
techniques such as the identification of the connected components in the body of the input query, 
or the computation of any form of query decomposition [18], are not applied. Moreover, (ii) the 
fact that the given set of constraints can be used to identify existential joins in the reformulated 
query which can be eliminated is not exploited. Finally, (Hi) the factorization step (which is needed 
to guarantee completeness) is applied exhaustively, and as a result many superfluous queries are 
generated. 

Perez-Urbina et al. [H] proposed an alternative resolution-based rewriting algorithm, imple- 
mented in the Requiem system, that addressed the issue of the useless factorizations (and therefore 
of the redundant queries generated due to this weakness) by directly handling existential quantifi- 
cation through proper functional terms. The algorithm has then been extended to more expressive 
DL languages [19]. In this case the output of the rewriting is a Datalog program. 

Rosati et al. [20] recently proposed a very sophisticated rewriting technique, implemented in the 
Presto system, that addresses some of the issues described above. In particular, (i) the unnecessary 
existential joins are eliminated by resorting to the concept of most-general subsumees, which also 
avoids the unnecessary factorizations, and (ii) the connectivity of the given query is checked before 
executing the algorithm; in case the query is not connected. Presto splits the query in connected 
components and rewrites them separately. Notice that Presto produces a non-recursive Datalog 
program, and not a union of conjunctive queries. This allows the "hiding" of the exponential blow- 
up inside the rules instead of generating explicitly the disjunctive normal form. The final rewriting 
is exponential only in the number of non-eliminable existential joins, but not in the size of the input 
query. 

The approaches presented above have been proven very effective when applied to very particular 
classes of description logic constraints. Following a more general approach for ontological query 
answering. Call et al. [Hj presented a backward-chaining rewriting algorithm which is able to deal 
with arbitrary sets of TGDs, providing that the class of TGDs under consideration satisfies suitable 
syntactic restrictions that guarantee the termination of the algorithm. However, this algorithm is 
inspired by the original QuOnto algorithm and inherits all its drawbacks. 

Despite the possibly exponential number of queries to be constructed, we know that all such 
queries are independent from each other, and thus they can be easily executed in parallel threads 
and distributed on multiple processors. Notice that a non-recursive Datalog program is not equally 
easy to distribute. Moreover, the optimizations implemented in current DBMS systems for (unions 
of) conjunctive queries are much more advanced than those implemented for the positive existential 
first-order queries resulting from the translation of a non-recursive Datalog program into a concrete 
query language such as SQL. It is clear that a trade-off between these two approaches must be found 

^http: //virtuoso. openlinksw.com/ 
^http://www.openrdf.org/ 
*http: / /kaon. semanticweb.org/ 
^http://jena.sourceforge.net/ 
®http://www.ontotext.com/owlim/ 



in order to exploit as much as possible the current optimization techniques, while keeping the size 
of the rewriting reasonably small in order to make the execution of it feasible in practice. 

A related research field is that of query minimization [21], in particular, in presence of views 
and constraints [22l[15]. Given a conjunctive query g, and a set of constraints S, the goal is to find 
all the minimal equivalent reformulations of q under the constraints of S. The most interesting 
approach in this respect is the chase & back-chase algorithm (C&B) [15] , implemented in the MARS 
system [23]. The algorithm freezes the atoms of hody{q) and, by considering them as a database Dq, 
applies the following two steps. During the chase-step, the chase of Dq w.r.t. S is constructed, and 
then the atoms of chase{Dq, S) are considered as the body-atoms of a query Qu, called the universal 
plan. The back-chase step considers all the possible subsets of the atoms of body{qu), starting from 
those with a single-atom, which are then considered as the body of a query q' . Whenever there 
exists a containment mapping from body{qu) to chase{Dq' ,1^), where Dq' is the database obtained 
by freezing body{q'), then q' is an equivalent reformulation of q. Moreover, every time an equivalent 
reformulation q' is found, the back-chase does not consider any of the supersets of the atoms of 
body{q') because they will be automatically implied by the atoms of q' , and therefore the produced 
query would be redundant. This particular exploration strategy guarantees the minimality of the 
reformulations. A non-negligible drawback of this approach is the fact that we need to compute 
the chase of Dq w.r.t. S, and also the chase for the (exponentially many) databases Dq' w.r.t S. 
Clearly, this makes the procedure computationally expensive. 

3 Preliminaries 

In this section we recall some basics on relational databases, conjunctive queries, tuple-generating 
dependencies, and the chase procedure. 

3.1 Relational Databases and Conjunctive Queries 

Consider two pairwise disjoint (infinite) sets of symbols Ac and such that: Ac is a set of 
constants (which constitutes the domain of a database), and A^ is a set of labeled nulls (used 
as placeholders for unknown values). Different constants represent different values {unique name 
assumption), while different nulls may represent the same value. Throughout the paper, we denote 
by X sequences of variables Xi, . . . , X^, where A; > 0, and by [n] the set {1, . . . , n}, for any n > 1. 

A relational schema TZ (or simply schema) is a set of relational symbols (or predicate symbols), 
each with its associated arity. A position r[i] (or (r, i)) is identified by a predicate r £ TZ and its 
i-th argument. A term t is a constant, labeled null, or variable. An atomic formula (or simply 
atom) has the form r{ti, . . . ,tn), where r £ TZ has arity n, and ti, . . . ,t„ are terms. Conjunctions 
of atoms are often identified with the sets of their atoms. 

A substitution from one set of symbols Si to another set of symbols 5*2 is a function /i : 5i — > 5*2 . 
A homomorphism from a set of atoms Ai to a set of atoms A2, both over the same schema TZ, is a 
substitution h from the set of terms of Ai to the set of terms of A2 such that: (i) if t € Ac, then 
h{t) = t, and (ii) if r{ti, . . . ,tn) is in Ai, then h{r{ti, . . . , tn)) = r{h{ti), . . . , h{tn)) is in A2. The 
notion of homomorphism naturally extends to conjunctions of atoms. 

A relational instance (or simply instance) I for a schema 7^ is a (possibly infinite) set of atoms of 
the form ^(t), where r £TZ has arity n and t G (AcU A^)". A database is a finite relational instance. 
A conjunctive query (CQ) q of arity n over a schema TZ\s a. formula of the form g(X) ^ (/)(X, Y), 
where (/'(X, Y) is a conjunction of atoms over TZ, and q is an n-ary predicate. (/)(X, Y) is called the 
body of q, denoted as body{q), and g(X) is the head of q, denoted as head{q). A Boolean conjunctive 
query (BCQ) is a CQ of arity zero. The answer to a CQ q of arity n over an instance /, denoted as 
q{I), is the set of all n-tuples t € (Ac)" for which there exists a homomorphism h : XU Y — )■ AcU A^ 
such that /i((/)(X, Y)) C I and /i(X) = t. A BCQ has only the empty tuple () as possible answer, in 
which case it is said that has positive answer. Formally, a BCQ has positive answer over /, denoted 
as I \= q, iff () G qil)- A union of CQs (UCQ) Q of arity n is a set of CQs, where each q £ Q has 



the same arity n and uses the same predicate symbol in the head. The answer to Q over an instance 
/, denoted as Q{I), is defined as the set of tuples {t | there exists q £ Q such that t G (?(/)}• 

3.2 Tuple-Generating Dependencies 

A tuple- generating dependency (TGD) a over a schema 7^ is a first-order formula VXVY<^(X, Y) — )> 
3Z'i/'(X, Z), where <j)(X.,Y) and ^(X, Z) are conjunctions of atoms over TZ, called the body and the 
head of a, denoted as hody{a) and head{a), respectively. Henceforth, to avoid notational clutter, we 
will omit the universal quantifiers in TGDs. Such a is satisfied by an instance / for TZ iff, whenever 
there exists a homomorphism h such that /i(0(X, Y)) C /, there exists an extension h' of h (i.e., 
h' D h) such that h'iipiX, Z)) C /. 

We now define the notion of query answering under TGDs. Given a database D for TZ, and a 
set S of TGDs over TZ, the models of D w.r.t. S, denoted as mods{D, S), is the set of all instances / 
such that I \= DUT,, which means that I ^ D and / satisfies S. The answer to a CQ q w.r.t. D and 
S, denoted as ans{q, D, S), is the set {t | t € q{I) for each I G mods{D, S)}. The answer to a BCQ 
q w.r.t. D and S is positive, denoted as -DUS ^ iff ans{q, D, S) 7^ 0. Note that query answering 
under general TGDs is undecidable [23], even when the schema and the set of TGDs are fixed [25]. 
We recall that the two problems of answering CQs and BCQs under TGDs are equivalent |2H I26j. 
Roughly speaking, we can enumerate the polynomially many tuples of constants which are possible 
answers to q, and then, instead of answering the given query q, we answer the polynomially many 
BCQs that we obtain by replacing the variables in the body of q with the appropriate constants. 
A certain tuple t of constants is in the answer of q iff the answer to the BCQ that we obtain from 
t is positive. Henceforth, we thus focus only on the BCQ answering problem. 

3.3 The TGD Chase 

The chase procedure (or simply chase) is a fundamental algorithmic tool introduced for checking 
implication of dependencies [27] . and later for checking query containment |28j . Informally, the 
chase is a process of repairing a database w.r.t. a set of dependencies so that the resulted database 
satisfies the dependencies. We shall use the term chase interchangeably for both the procedure and 
its result. The chase works on an instance through the so-called TGD chase rule. 

TGD Chase Rule: Consider a database D for a schema TZ, and a TGD a : </>(X, Y) — > 
3Z'i/;(X,Z) over TZ. If a is applicable to D, i.e., there exists a homomorphism h such that 
/i((/)(X, Y)) C D then: (i) define h' ^ h such that h'{Zi) = Zi, for each Zi G Z, where Zj G is 
a "fresh" labeled null not introduced before, and (ii) add to D the set of atoms in h'{i{j(X., Z)), if 
not already in D. 

Given a database D and a set of TGDs S, the chase algorithm for D and S consists of an 
exhaustive application of the TGD chase rule in a breadth-first fashion, which leads as result to 
a (possibly infinite) chase for D and S, denoted as chase{D,Ti). For the formal definition of the 
chase algorithm we refer the reader to [8]. 

The (possibly infinite) chase for D and S is a universal model of D w.r.t. S, i.e., for each 
instance / G mods{D,T,), there exists a homomorphism from chase{D,T,) to / [26^ I29j. Using this 
fact it can be shown that D UT, |= g iff chase{D, S) \= q, for every BCQ q. 

4 The Datalog^^ Family 

In this section we present the main Datalog^ languages under which query answering is decidable, 
and (almost in all cases) also tractable in data complexity. 



4.1 Decidability Paradigms 



We first discuss the three main paradigms for ensuring decidabihty of query answering, namely, 
chase termination, guardedness and stickiness. 

Chase Termination. In this case the chase always terminates and produces a finite universal 
model U. Thus, given a query we just need to evaluate it over the finite database U. The most 
notable syntactic restriction of TGDs guaranteeing chase termination is weak-acyclicity, which is de- 
fined by means of a graph-based condition, for which we refer the reader to the landmark paper [29]. 
Roughly speaking, in the chase constructed under a weakly- acyclic set of TGDs over a schema TZ, 
only a finite number of distinct values can appear at any position of TZ, and thus after finitely many 
steps the chase procedure terminates. It is known that query answering under a weakly-acyclic set 
of TGDs is PTiME-complete |29j and 2EXPTiME-complete |10j in data and combined complexity, 
respectively. More general syntactic restrictions that guarantee chase termination were proposed 
in [26] and [30]. 

Guardedness. Guarded TGDs, introduced and studied in [25], have an atom in their body, 
called the guard, that contains all the universally quantified variables. For example, the TGD 
r{X,Y),s{X,Y,Z) 3Ws{Z,X,W) is guarded via the guard atom s{X,Y,Z), while the TGD 
r{X,Y),r{Y, Z) — ?> r{X,Z) is not. Decidability of query answering follows from the fact that 
the chase constructed under a set of guarded TGDs has the bounded treewidth property, i.e., is a 
"tree-like" structure. The data and combined complexity of query answering under a set of guarded 
TGDs is PTiME-complete [7] and 2EXPTiME-complete [25], respectively. 

Linear TGDs, proposed in [7], is a FO-rewritable variant of guarded TGDs. A TGD is linear iff 
it contains only one atom in its body. Obviously a linear TGD is trivially guarded since the singleton 
body-atom is automatically a guard. Linear TGDs are more expressive than the well-known class 
of inclusion dependencies. Query answering under linear TGDs is in the highly tractable class ACq 
in data complexity [7]. The same problem is PSPACE-complete in combined complexity; this result 
is immediately implied by results in [28]. 

An expressive class, which forms a generalization of guarded TGDs, is the class of weakly- 
guarded sets of TGDs introduced in [25]. Intuitively speaking, a set S of TGDs is weakly-guarded 
iff in the body of each TGD of S there exists an atom, called the weak-guard, that contains all 
the universally quantified variables that appear only at positions where a "fresh" null of can 
appear during the construction of the chase. Query answering under a weakly-guarded set of 
TGDs is EXPTiME-complete [25] and 2EXPTlME-complete j25] in data and combined complexity, 
respectively. 

Stickiness. In this paragraph we present a Datalog^ language (and its extensions), which 
hinges on a paradigm that is very different from guardedness. Sticky sets of TGDs are defined 
formally by an efficiently testable condition involving variable-marking [9]. In what follows we just 
give an intuitive definition of this class. For every database D, assume that during the construction 
of chase of D under a sticky set of TGDs, we apply a TGD o" G S that has a variable V appearing 
more than once in its body; assume also that V maps (via homomorphism) on the symbol z, and 
that by virtue of this application the atom a is introduced. In this case, for each atom b in body [a), 
we say that a is derived from b. Then, we have that z appears in a and in all atoms resulting 
from some chase derivation sequence starting from a, "sticking" to them (hence the name "sticky" 
sets of TGDs). Interestingly, sticky sets of TGDs are FO-rewritable, and thus query answering is 
feasible in ACq in data complexity [9]. Combined complexity of query answering is known to be 
EXPTiME-complete [9]. 

In |10] the FO-rewritable class of sticky-join sets of TGDs, that captures both linear TGDs 
and sticky sets of TGDs, is introduced. Similarly to sticky sets of TGDs, sticky-join sets are 
defined formally by a testable condition based on variable-marking. However, this variable-marking 
procedure is more sophisticated than the one used for sticky sets, and due to this fact the problem of 
identifying sticky-join sets of TGDs is harder than the one of identifying sticky sets. In particular, 
given a set S of TGDs, we can decide in ptime whether S is sticky, while the problem whether S 
is sticky-join is PSPACE-complete. Note that the data and combined complexity of query answering 
under sticky and sticky-join sets of TGDs coincide. 



4.2 Additional Features 



In this subsection we briefly discuss how the languages presented above can be combined with 
negative constraints and key dependencies, without altering the complexity of query answering. 

Negative Constraints. A negative constraint (NC) ly over a schema 7^ is a first-order formula 
VX (j)(X.) — )• _L, where _L denotes the truth constant false. NCs are vital when representing ontologies 
(see, e.g., [HE]), as well as conceptual schemas such as Entity-Relationship diagrams (see, e.g., 
[3n [32]). With NCs we can assert, for example, that students and professors are disjoint sets: 
yx student {X), professor (X) — )• _L. Also, we can state that a student cannot be the leader of a 
research group: \/X\/Y student (X) , leads{X,Y) — )■ _L. 

It is known that checking NCs is tantamount to query answering [7]. In particular, given an 
instance /, a set T,± of NCs, and a set S of TGDs, for each NC v of the form VX(^(X) — )■ _L, we 
answer the BCQ qu{) (/'(X). If at least one of such queries answers positively, then /USUS_l |= _L 
(i.e., the theory is inconsistent), and therefore / U S U E_l |= q, for every BCQ q; otherwise, given 
a BCQ g, we have / U S U T,± ^ g' iff / U S \= q, i-e., we can answer q by ignoring the set of NCs. 

Key Dependencies. It is well-known that the interaction of general TGDs and key dependen- 
cies (KDs) leads to undecidability of query answering [33]; we assume that the reader is familiar 
with the notion of KD (see, e.g., [M])- Thus, the classes of TGDs presented above cannot be com- 
bined arbitrarily with KDs. Suitable syntactic restrictions are needed in order to ensure decidability 
of query answering. 

A crucial concept towards this direction is separability [35], which formulates a controlled in- 
teraction of TGDs and KDs. Formally speaking, a set S = U T,k over a schema TZ, where St 
and T,K are sets of TGDs and KDs, respectively, is separable iff for every instance I for TZ, either 
/ violates or for every BCQ q over 7^, IUS|=giff/U Tit \= q- Notice that separability is a 
semantic notion. A sufficient syntactic criterion for separability of TGDs and KDs is given in [7]; 
TGDs and KDs satisfying the criterion are called non- conflicting. 

Obviously, in case of non-conflicting sets of TGDs and KDs, we just need to perform a pre- 
liminary check whether the given instance satisfies the KDs, and if this is the case, then we elim- 
inate them, and proceed by considering only the set of TGDs. This preliminary check can be 
done using negative constraints. For example, to check whether the KD key{r) = {!}, stating 
that the first attribute of the binary relation r is a key attribute, is satisfied by the database 
D, we just need to check whether the database obtained by adding to D the set of atoms 
{neq{a, h) \ a ^h, and a, h are constants occurring in L)}, where neq is an auxiliary predicate, sat- 
isfies the negative constraint r(X, y), r(X, Z), ne5(y, Z) — )• _L. The atom neq{a,b) implies that a 
and b are different constants. Since, as already mentioned, checking NCs is tantamount to query 
answering, we immediately get that the complexity of query answering under non-conflicting sets 
of TGDs and KDs is the same as in the case of TGDs only. 

Interestingly, by combining non-conflicting linear (or sticky) sets of TGDs and KDs with NCs, 
we get strictly more expressive formalisms than the most widely-adopted tractable ontology lan- 
guages, in particular DL-Lite^, DL-Litej^ and DL-Lite?^, without loosing FO-rewritability, and 
consequently high tractability of query answering in data complexity. For more details, we refer 
the interested reader to [3 [9] . 

5 Datalog* for OBDA 

In this section we consider the problem of BCQ answering under the FO-rewritable members of the 
Datalog^ family, namely, linear, sticky and sticky-join sets of TGDs. Given a BCQ q and a set S of 
TGDs, the actual computation of the rewriting is done by applying a backward-chaining resolution 
procedure using the rules of S as rewriting rules. Our algorithm optimizes the algorithm presented 
in [H] by greatly reducing the number of BCQs in the rewriting, and therefore improves the overall 
performance of query answering. Before going into the details of the rewriting algorithm, we first 
give some useful notions. 



A set of atoms A = {ai, . . . , a„}, where n ^ 2, unifies if there exists a substitution 7, cahed 
unifier for A, such that 7(0.;^) = . . . = 7(a„). A most general unifier (MGU) for A is a unifier 
for A, denoted as 7a, such that for each other unifier 7 for A, there exists a substitution 7' such 
that 7 = 7' o 7A- Notice that if a set of atoms unify, then there exists a MGU. Furthermore, the 
MGU for a set of atoms is unique (modulo variable renaming). The MGU for a singleton set {a} 
is defined as the identity substitution on the set of terms that occur in a. 

Let us now give some auxiliary results which will allow us to simplify our later technical defi- 
nitions and proofs. The first such lemma states that we can restrict our attention on TGDs that 
have only one head-atom. 

Lemma 1 BCQ answering under (general) TGDs and BCQ answering under TGDs with just one 
head-atom are LOGSPACE-equivalent problems. 

Proof. It suffices to show that BCQ answering under (general) TGDs can be reduced in logspace 
to BCQ answering under TGDs with just one head-atom. Consider a BCQ q over a schema TZ, a 
database D for TZ, and a set T, of TGDs over TZ. We construct S' from S by applying the following 
procedure. For each TGD a G T,, where head{a) = {0.1, . . . ,0^} and X is the set of variables that 
occur in head{a), replace a with the following set of TGDs: 

body (a) r^{X}, 
r^(X) 0,1, 
r^(X) 0,2, 

r^(X) ^ Ofc, 

where r„ is an auxiliary predicate not occurring in TZ having the same arity as the number of 
variables in X. It is not difficult to see that the above construction is feasible in logspace. 
By construction, except for the atoms with an auxiliary predicate, chase{D,Ti) and chase{D,Ti') 
coincide. The auxiliary predicates, being introduced only during the above transformation, do not 
match any predicate symbol in q, and hence chase{D, T,) \= q iS chase{D, T,') \= q, or, equivalently, 
DUT,^ qiS BUT.' \= q'. □ 

The next lemma implies that we can restrict our attention on TGDs that have only one exis- 
tentially quantified variable which occurs only once. 

Lemma 2 BCQ answering under (general) TGDs and BCQ answering under TGDs with at most 
one existentially quantified variable that occurs only once are LOGSPACE-equivalent problems. 

Proof. It suffices to show that BCQ answering under (general) TGDs can be reduced in logspage 
to BCQ answering under TGDs that have at most one existentially quantified variable which occurs 
only once. Consider a BCQ q over a schema TZ, a database D for TZ, and a set S of TGDs over 
TZ. We construct T,' from S by applying the following procedure. For each TGD a € S, where 
{Xi, . . . , Xn}, for n ^ 1, is the set of variables that occur both in body{a) and head{a), and 
{Zi, . . . , Zm}, for m > 1, is the set of the existentially quantified variables of a, replace a with the 
following set of TGDs: 

body{a) ^ 3Zirl{Xi,...,Xn,Zi), 
rl[Xi,...,Xn,Zi) ^ 3Z2rl[Xi,...,Xn,Zi,Z2), 



^(^Xi, . . . , Xn, Zi, . . . , Zm-l) ^ 3Zmr^{Xi, . . . , Xn, Zi, . . . , Zm), 

C(Xi,...,X„,Zi,...,Z„) ^ headia), 

where is an auxiliary predicate of arity n -\- i, for each i G [m]. It is easy to see that the above 
procedure can be carried out in logspace. By construction, except for the atoms with an auxil- 
iary predicate, chase{D,Ti) and chase{D,Ti') are the same (modulo bijective variable renaming). 



The auxiliary predicates, being introduced only during the above construction, do not match any 
predicate symbol in q, and hence chase{D, T,) ^ g iff chase{D, S') \= q, or, equivalently, D UT, \= q 
iS DU^' \=q. □ 



Since the transformations given above preserve the syntactic condition of linear, sticky and 
sticky-join sets of TGDs, henceforth we assume w.l.o.g. that every TGD has just one atom in its 
head which contains only one existentially quantified variable that occurs only once. In the rest of 
the paper, for notational convenience, given a TGD a, we denote by tt„ the position in head{a) at 
which the existentially quantified variable occurs. 

We now give the notion of applicability of a TGD to a set of body-atoms of a query. Let us 
assume w.l.o.g that the variables that appear in the query, and those that appear in the TGD, 
constitute two disjoint sets. Given a BCQ q, a variable is called shared in q if it occurs more than 
once in body{q). Notice that in the case of (non-Boolean) CQs, a variable is shared in q if it occurs 
more than once in q (considering also the head of q and not just its body). 



Definition 1 (Applicability) Consider a BCQ q over a schema TZ, and a TGD a over IZ. Civen 
a set of atoms A C hody{q) that unifies, we say that a is applicable to A if the following conditions 
are satisfied: (i) the set A U {head{a)} unifies, and (ii) for each a£ A, if the term at position vr in 
a is either a constant or a shared variable in q, then vr 7^ tTo-. 



Let us now introduce the notion of factorizability which, as we explain below, makes one of 
the main differences between our algorithm and the one presented in [14], due to which a perfect 
rewriting with less BCQs is obtained. 



Definition 2 (Factorizability) Consider a BCQ q over a schema TZ, and a TCD a overTZ which 
contains an existentially quantified variable. A set of atoms A C body{q), where \A\ ^ 2, that unifies 
is factorizable w.r.t. a if there exists a variable V that occurs in every atom of S only at position 
TTa, and also V does not occur in body{q) \ S. 



It is important to clarify that in the case of (non-Boolean) CQs, the notion of factorizability is 
defined as above, except that the variable V does not occur in {{head{a)} U body{a)) \ S. 



Example 1 (Factorization) Consider the BCQs 

qi : q{) ^ t{A, B,C),t{A, E,C) 

^ V ' 

Si 

q2 : q{) ^ s{C),t{A, B , C),t{A, E,C) 

^ V ' 

93 : qO ^ t{A,B,C),t{A,C,C) 

^ V ' 

S3 

and the TGD a : s{X),r{X,Y) — )• 3Z t{X,Y, Z). Clearly, 5*1 is factorizable w.r.t. a since the 
substitution {E B} is a unifier for Si, and also C appears in both atoms of 5i only at position 
TTfj. The factorization results in the query qQ ^ t{A, B,C); notice that a is not applicable to 
Si, but it is applicable to {t{A, B,C)}. On the contrary, despite the fact that 5*2 unifies, it is 
not factorizable w.r.t. a since C occurs also in body{q2) \ S2- Finally, even if S3 unifies, it is not 
factorizable w.r.t. a since C appears in 5*3, not only at position vTo-, but also at position t[2]. 



We are now ready to describe the algorithm TGD-rewrite, depicted in Algorithm [H which is 
based on the rewriting algorithm presented in [T3]. The perfect rewriting of a BCQ q w.r.t. a 
set of TGDs S is computed by exhaustively applying (i.e., until a fixpoint is reached) two steps: 
factorization and rewriting. 



Algorithm 1: The algorithm TGD-rewrite 



Input: a BCQ q over a schema TZ, a set E of TGDs over TZ 
Output: tlie FO-rewriting Qfin of q w.r.t. S 

Qrew := {(g, 1)}; 
repeat 



foreach {{q,x)} G Qtemp, where x e {0, 1}, do 
/* factorization step 
foreach cr G S do 

q' := factorize(q,a)] 

if notExists{{q' ,y),QuEw), where y G {0, 1}, then 
L Qrew :=QREwU {(g',0)}; 

/ * rewriting step 
foreach A C body{q) do 
foreach cr G S do 

if is Applicable [a T A, q) then 

q' ■= lAu{headia)}{q[A/body{(j)]); 
if notExists{{q' ,l),Q-nEw) then 
L Qrew QREwU{(g',l)}; 



until Qtemp — Qrew! 
Qfin := {q I {q,x) G 
return Q^jy, 



and a; = 1}; 



Factorization Step. The function factorize{q,a), providing that there exists a subset of 
body{q) which is factorizable w.r.t. a (otherwise, the query q is returned), first selects such a set 
S C body{q). Then, the query q' is constructed by applying the MGU 75 for 5 on q. Providing 
that there is no pair {q",y), where y G {0, 1}, in Qrew such that q' and q" are the same (modulo 
bijective variable renaming), the pair {q' , 0) is added to Qrew; the label keeps track of the queries 
generated by the factorization step that must be excluded from the final rewriting. This is carried 
out by the notExists function. 

Rewriting Step. If there exists a pair {q,y) and a TGD cj G S which is applicable to a set 
of atoms A C body{q), then the algorithm constructs a new query q' = ^ALi{head(a)}{Q[^/body{a)]), 
that is, the BCQ obtained from q by replacing A with body (a) and then applying the MGU for the 
set Au {head{(j)}. Providing that there is no pair {q" , 1) in Qrew such that q' and q" are the same 
(modulo bijective variable renaming), the pair {q' , 1) is added to Qrew; the label 1 keeps track of 
the queries generated by the rewriting step which will be the final rewriting. 

Example 2 (Rewriting) Consider the set E of TGDs 

CTi : s{X) 3Z t{X,X,Z) 
a2 : t{X,Y,Z) ^ r{Y,Z) 

and the query q{) t{A, B,C),r{B,C). TGD-rewrite first applies a2 to {r{B,C)} since ai is 
not applicable. The query qi : qi) <r- t{A, B,C),t{V^ , B,C) is produced. Clearly, body{qi) is 
factorizable w.r.t. cji and the query q2 : q{) ^ t{A,B,C) is obtained. Now, cti is applicable to 
{t{A,B,C)} and the query ^3 : q() s{A) is obtained. The perfect rewriting constructed by the 
algorithm is the set {q, qi, q^}. 

The next example shows that dropping the applicability condition, then TGD-rewrite may pro- 
duce unsound rewritings. 

Example 3 (Loss of soundness) Suppose that we ignore the applicability condition during the 
rewriting process. Consider the set T, of TGDs given in Example [21 and also the BCQ qi : q{) 



t{A,B,c), where c is a constant of A^. A BCQ q' of the form q() ^ s{V) is obtained, where the 
information about the constant c is lost. Consider now the database D = {s{b),t{a,b,d)} for TZ. 
The query q' maps to the atom s(b) which imphes that D \= q'. However, the original query q 
does not map to chase{D,T,), and thus D U S ^ q. Therefore, any rewriting containing q' is not 
a sound rewriting of q given S. Consider now the query q" : q{) t{A, B, B). The same query 
q' mapping to the atom s{b) of D is obtained. However, during the construction of chase{D,T,) 
it is not possible to get an atom of the form t{X,Y,Y), where at positions t[2] and t[3] the same 
value occurs. This implies that there is no homomorphism that maps q to chase{D, S), and hence 
D U S ^ q. Therefore, any rewriting containing q' is again unsound. 

The applicability condition may prevent the generation of queries that are vital to guarantee 
completeness of the rewritten query, as shown by the following example. This is exactly the reason 
why the factorization step is also needed. 

Example 4 (Loss of completeness) Consider the set S of TGDs 

cji : p{X) 3Yt{X,Y) 
a2 : t{X,Y) ^ s{Y) 

and the query q ■ q{) t{A, B), s{B). The only viable strategy in this case is to apply o"2 to {s{B)}, 
since ai is not applicable to {t{A,B)} due to the shared variable B. The query that we obtain is 
q' : q{) ^ t[A, B),t{V^ , B), where is a fresh variable. Notice that in q' the variable B remains 
shared thus it is not possible to apply ai. It is obvious that without the factorization step there 
is no way to obtain the query q" : q{) ^ piA) during the rewriting process. Now, consider the 
database D = {p{a)}. Clearly, chase{D,T,) = {p{a),t{a, zi), s{zi)}, and therefore chase{D,T,) \= q, 
or, equivalently, DUT, \= q. However, the rewritten query is not entailed by the given database D, 
since q" does not belong to it, which implies that it is not complete. 

We proceed now to establish soundness and completeness of the proposed algorithm. Towards 
this aim we need two auxiliary technical lemmas. The first one, which is needed for soundness, 
states that once the chase entails the rewritten query constructed by the rewriting algorithm, then 
the chase entails also the given query. In the sequel, for brevity, given a BCQ q over a schema TZ 
and a set S of TGDs over TZ, we denote by gs the rewritten query TGD-rewrite((7, S). 

Lemma 3 Consider a BCQ q over a schema TZ, a database D for TZ, and a set S of TGDs over 
TZ. If chase {D,T,) \= q^, then chase{D,T,) \= q. 

Proof. The proof is by induction on the number of applications of the rewriting step. We denote 
by qj} the part of the rewritten query obtained by applying i times the rewriting step. 
Base Step. Clearly, c/^ = Q'Ej and the claim holds trivially. 

Inductive Step. Suppose now that chase{D,T,) \= q^, for i > 0. This implies that there 
exists p G q^ such that chase{D,T,) \= p, and thus there exists a homomorphism h such that 

h{body{p)) C chase{D,Ti). li p G q^^ ^\ then the claim follows by induction hypothesis. The 
interesting case is when p was obtained during the i-th application of the rewriting step from 
a BCQ p' € i-^-' — Qt. ^ {p}- induction hypothesis, it suffices to show that 

chase{D, S) \= q^^ . 

Clearly, there exists a TGD a € S of the form 0(X,Y) — >■ 3Zr(X.,Z) which is applicable 
to a set j4 C body{p'), and p is such that body{p) = 7(p'[^/(?i>(X, Y)]), where 7 is the MGU for 
the set A U {/learf(cr)}. Observe that /i(7((/>(X, Y))) C chase{D,Ti), and hence a is applicable 
to chase{D,Ti); let fi = /i o 7. Thus, ^'(r(X, Z)) S chase{D,T,), where fi' D /i. We define the 
substitution h' = hU {jiZ) -> 



Let us first show that h' is a well-defined substitution. It suffices to show that "y{Z) is not a 
constant, and also that ^{Z) does not appear in the left-hand side of an assertion of h. Towards a 
contradiction, suppose that ^{Z) is either a constant or appears in the left-hand side of an assertion 
of h. It is easy to verify that in this case there exists an atom a£ A such that at position tTo- in a 
occurs either a constant or a variable which is shared in p'. But this contradicts the fact that a is 
applicable to A. Consequently, h' is well-defined. It remains to show that the substitution /i' o 7 
maps body{p') to chase{D,Ti), and thus chase{D,T,) \= g| Clearly, 'y{body{p') \ A) C body{p). 
Since h{body{p)) C chase{D,Ti), we get that h'{'y{body{p') \A)) C chase{D,T,). Moreover, 

h'{j{A)) = /.'(7(r(X,Z))) 

= r{h'{j{X)),h'{j{Z))) 

= r{^l{-K.),^l'{z)) 

€ chase{D,T,). 

The proof is now complete. □ 

The second auxiliary lemma, which is needed for completeness, asserts that once the chase 
entails the rewritten query constructed by the rewriting algorithm, then the given database also 
entails the rewritten query. 

Lemma 4 Consider a BCQ q over a schema TZ, a database D for TZ, and a set S of TGDs over 
TZ. If chase{D, S) |= q^, then D \= q^;. 

Proof. We proceed by induction on the number of applications of the chase step. 
Base Step. Clearly, chase^'^^D, S) = D, and the claim holds trivially. 

Inductive Step. Suppose now that c/jaseW(Z), S) \= qY;, for i > 0. This implies that there 
exists p € qY: such that chase^^\D,T,) \= p, and thus there exists a homomorphism h such that 
h{body{p)) C c/iase^*^ (L>, S). If h{body{p)) C c/iase'*~^'(D, S), then the claim follows by induction 
hypothesis. The non-trivial case is when the atom a, obtained during the i-th application of the 
chase step due to a TGD cr G S of the form (^(X, Y) 3Z rpi., Z), belongs to h{body{p)). Clearly, 
there exists a homomorphism fi such that /i((^(X,Y)) C chase^^~^\D,Yi) and a = //'(r(X, Y)), 
where jj.' 13 fj,. By induction hypothesis, it suffices to show that c/iase'*~^' (Z?, S) \= q^,- Before we 
proceed further, we need to establish an auxiliary technical claim. 

Claim 5 There exists a BCQ p' G q^ and a set of atoms A C body[p') such that a is applicable 
to A, and also there exists a homomorphism A such that \{body{p') \ A) chase^^~^\D ,Ti) and 
\{A) = a. 

Proof. Clearly, there exists a set of atoms B such that h{body{p)\B) C chase^^^^\D, S) and h{B) = 
a. Observe that the null value that occurs in a at position vTo- does not occur in chase^'~^\D,T.) or 
in a at some position other than tt^. Therefore, the variables that occur in the atoms of B at vTo- do 
not appear at some other position. Consequently, B can be partitioned into the sets . . . ,-Bm, 
where m > 1, and the following holds: for each i G [m], in the atoms of Bi at position tTo- the same 
variable Vi occurs, and also Vi does not occur in some other set B G {-Bi, . . . , B„i} \ {Bi} or in Bi 
at some position other than tTo-. It is easy to verify that each set Bi is factorizable w.r.t. a. 

Suppose that we factorize Bi. Then, the query pi = 71 (p), where 71 is the MGU for Bi, is 
obtained. Observe that /i is a unifier for Bi. By definition of the MGU, there exists a substitution 
9i such that h = 9i o ^i. Clearly, 

9i{body{pi)\ji{B)) = 9i{ji{body{p))\^i{B)) 
= h{body{p)\B) 
C c/iase[*-^](D,S), 



and ^1 (71(B)) = h{B) =a. 

Now, observe that the set 7i(-B2) Q body{pi) is factorizable w.r.t. a. By applying factorization 
we get the query p2 = 72(^1)1 where 72 is the MGU for 7i(i?2)- Since 9i is a unifier for 71 (-62)1 
there exists a substitution 62 such that 9i = ^2 o 72 • Clearly, 

92ibody{p2) \ 72(7i(^))) = ^2(72(&ody(pi)) \ 72(7i(^))) 

= 9i{^i{bodyip))\-fiiB)) 
= h{body{p)\B) 
C c/iase[*"il(L»,S), 

and ^2(72(71 (5))) = ^1(71 (5)) = HB) = a. 

Eventually, by applying the factorization step as above, we will get the BCQ 

Pm = 7m o • • • o 7i(p)i 

where jj is the MGU for the set Jj-i o . . . o ^i(Bj), for j € {2, . . . , m} (recall that 71 is the MGU 
for Bi), such that 9m{hody{pm)\lm o • • .071(5)) C chase^''~^\D , S) and 6*^(7™ ° ■ ■ .071(5)) = a. 

It is easy to verify that a is applicable to A. The claim follows withj?' = pm, A = "frn°- ■ .o7i(-B) 
and A = 9m- □ 

The above claim implies that during the rewriting process eventually we will get a BCQ p" such 
that hody{p") = -f{body{p') \A)\J 'y{(j)(X,Y)), where 7 is the MGU for the set A U {head{a)}. It 
remains to show that there exists a homomorphism that maps body{p") to c/jase[*-il(L>,S). bmce 
A U ^' is a well-defined substitution, we get that A U ^u' is a unifier for A U {head{a)}. By definition 
of the MGU, there exists a substitution 9 such that XL) fx' = 9 o ^. Observe that 

9{body{p")) = 9{j{body{p')\A)U^i<P{X,Y))) 

= (A U ^l'){bodyip') \ A) U (A U i^'){cPiX,Y)) 
= Xibodyip')\A)Ufi'{<P(X,Y)) 
C c/iase[*-il(L>,S). 

Consequently, the desired homomorphism is 9, and the claim follows. □ 
We are now ready to establish soundness and completeness of the algorithm TGD-rewrite. 

Theorem 6 Consider a BCQ q over a schema IZ, a database D for IZ, and a set S of TGDs over 
n. It holds that, D ^ q^, iff DU^^q. 

Proof. Suppose first that D \= qg- Since D C chase{D,'L), we get that chase{D,T,) \= gs, 
and the claim follows by Lemma [3l Suppose now that D U S \= q^,. Since € (^e , we get that 
chase{D, E) \= qs, and the claim follows by Lemma [H □ 

Notice that the above result holds for arbitrary TGDs. However, termination of TGD-rewrite 
is guaranteed if we consider linear, sticky or sticky-join sets of TGDs since, during the rewriting 
process, only finitely many queries (modulo bijective variable renaming) are generated. 

Theorem 7 The algorithm TGD-rewrite terminates under linear, sticky or sticky-join sets of 
TGDs. 

Approaches such as those of [5] and [H] resort to exhaustive factorizations of the atoms in the 
queries generated by the rewriting algorithm. By factorizing a query q we obtain a subquery q', 
that is, q implies q' (w.r.t. the given set of TGDs). Observe that by computing the factorized 
query q' we eliminate unnecessary shared variables, in the body of q, due to which the applicability 
condition is violated. Consider for example the query q' of Example HI By factorizing the body 
of q' we obtain the query (/() t{A, B) which is a subquery (w.r.t. to the given set S of TGDs) 



of q' (in this case equivalent to g'), where the variable B is no longer shared. Thus, the rewriting 
step can now apply ai to {t{A,B)~\ and produce the query g() ^ p{A) which is needed to ensure 
completeness. 

The exhaustive factorization produces a non-negligible number of redundant queries as demon- 
strated by the simple example above. It is thus necessary to apply a restricted form of factorization 
that generates a possibly small number of BCQs that are necessary to guarantee completeness of 
the rewritten query. This corresponds to the identification of all the atoms in the query whose 
shared existential variables come from the same atom in the chase, and they can be thus unified 
with no loss of information. The key principle behind our factorization process is that, in order to 
be applied, there must exist a TGD that can be applied to the output of the factorization. 

5.1 Exploiting Negative Constraints 

It is well-known that negative constraints (NCs) of the form VX i;^(X) — )■ _L are vital for representing 
ontologies. As already explained in Subsection 14.21 given a database D for a schema TZ, a set S of 
TGDs over 7^, and a set of NCs over 7^, once the theory D U S U is consistent, then we are 
allowed to ignore the NCs since, for every BCQ g, Z) U S U \= q \^ D \J Yi \= q. However, as 
shown in the following example, by exploiting the given set of NCs it is possible to further reduce 
the size of the final rewriting. 

Example 5 Consider the TGD a : t{X),s{Y) 3Zp{Y,Z), the NC v : r{X,Y),s{Y) ±, 
and the BCQ q{) ^ r{A, B),p{B,C). Clearly, due to the rewriting step, the query p q{) 
r{A, B),t{V^), s{B) is obtained during the rewriting process. However, this query is not really 
needed since, for any database D for TZ, D ^ p; otherwise, D violates the NC u which is a 
contradiction since we always assume that the theory D U {a, is consistent. 

It is not difficult to show that, given a BCQ q, and a set S of TGDs, if a query p G gs is not 
entailed by chase{D, S), for an arbitrary database D, then any query p' € q^; obtained during the 
rewriting process starting from p, also it is not entailed by chase{D, E). Assume now that the set 
S_L of NCs is part of the input. If we obtain a query p £ q-E such that there exists a homomorphism 
that maps body{u), for some NC u € to body{p), then we can safely ignore p since chase{D, S) 
does not entail p. 

Prom the above informal discussion, we conclude that we can further reduce the size of the final 
rewriting by modifying our algorithm as follows. During the execution of the rewriting algorithm 
TGD-rewrite (see Algorithm [1]) , after the factorization step (resp., rewriting step) we check whether 
there exists a homomorphism that maps body{v), for some NC v of the given set of NCs, to the 
body of the generated query q' . If there exists such a homomorphism, then the pair {q',0) (resp., 
{q', 1)) is not added to the set Qrew Furthermore, the pair {q, 1) is added to Qrew (see the first 
line of the algorithm) only if there is no homomorphism that maps body{v), for some NC v of the 
given set of NCs, to body{q). If there exists such a homomorphism, then the algorithm terminates 
and returns the emptyset, which means that chase{D, S) ^ q, for every database D for TZ. 

6 Rewriting Optimization 

It is common knowledge that the perfect rewriting obtained by applying a backward-chaining 
rewriting algorithm (like TGD-rewrite) is, in general, not very well-suited for execution by a DB 
engine due to the large number of queries to be evaluated. In this section we propose a technique, 
called query elimination, aiming at optimizing the obtained rewritten query under the class of linear 
TGDs. As we shall see, query elimination (which is an additional step during the execution of the 
algorithm TGD-rewrite) reduces (i) the number of BCQs of the perfect rewriting, (ii) the number 
of atoms in each query of the rewriting as well as (Hi) the number of joins. Note that in the rest of 



the paper we restrict our attention on linear TGDs. Recall that linear TGDs are TGDs with just 
one atom in their body. Since we also assume, as explained in the previous section, TGDs with 
just one atom in their head, henceforth, when using the term TGD, we shall refer to TGDs with 
just one body-atom and one head-atom. 

By exploiting the given set of TGDs, it is possible to identify atoms in the body of a certain 
query that are logically implied (w.r.t. the given set of TGDs) by other atoms in the same query. 
In particular, for each BCQ q obtained by applying the rewriting step of TGD-rewrite, the atoms 
of body{q) that are logically implied (w.r.t. the given set of TGDs) by some other atom of body{q) 
are eliminated. Roughly speaking, the elimination of an atom from the body of a query implies the 
avoidance of the construction of redundant queries during the rewriting process. Thus, this step 
greatly reduces the number of BCQs in the perfect rewriting. Before going into the details, let us 
first introduce some necessary technical notions. 

Definition 3 (Dependency Graph) Consider a set S of TGDs over a schema TZ. The depen- 
dency graph ofT, is a labeled directed multigraph {N,E,X), where N is the node set, E is the edge 
set, and X is a labeling function E ^ T,. The node set N is the set of positions of TZ. If there is 
a TGD cr G S such that the same variable appears at position nb in body{cr) and at position iTh in 
head{a), then in E there is an edge e = {iTb,'^h) with A(e) = cr. 

Intuitively speaking, the dependency graph of a set S of TGDs describes all the possible ways 
of propagating a term from a position to some other position during the construction of the chase 
under S. More precisely, the existence of a path P from tti to 7r2 implies that it is possible (but 
not always) to propagate a term from vri to tt2- The existence of P guarantees the propagation of 
a term from tti to tt2 if, for each pair of consecutive edges e = (vr, vr') and e' = (vr', vr") of P, where 
e and e' are labeled by the TGDs a and a', respectively, the atom obtained during the chase by 
applying a triggers a'. To verify whether this holds we need an additional piece of information, the 
so-called equality type, about the body-atom and the head-atom of each TGD that occurs in P. 

Definition 4 (Equality Type) Consider an atom a of the form r{ti, . . . ,tn), where n > 1. The 
equality type of a is the set of equalities 



We denote the above set as eq{a). 

It is straightforward to see that, given a pair of TGDs a and a', if eq{body{a')) C eq{head{a)), 
then there exists a substitution /i such that fi{body{a')) = head{a). This allows us to show that 
the atom obtained by applying a during the construction of the chase triggers a' . Consequently, 
the existence of a path P (as above) guarantees the propagation of a term from tti to tt2 if, 
for each pair of consecutive edges e and e' of P which are labeled by cr and a\ respectively, 
eq{body{a')) C eq{head{a)). 

Example 6 (Dependency Graph) Consider the set S of TGDs 




{r[i\ 



= c 



ai : p{X,Y) ^3Zr{X,Y,Z) 
(72 : r{X,Y,c)^s{X,Y,Y) 
as : siX,X,Y)^piX,Y). 



The equality type of the body-atoms and head-atoms of the TGDs of S are as follows: 



eq{body{ai)) = 
eq{head{ai)) = 



eq{body{a2)) = {r[3] = c} 

eq{head{a2)) = {s[2] = s[3]} 

eq{body{as)) = {s[l\ = s[2]} 

eq{head{as)) = 0. 
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p[2]— ^r[2]^H.42] 
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Figure 2: Dependency graph for Example [H 
The dependency graph of S is shown in Figure [2j 

We are now ready, by exploiting the dependency graph of a set of TGDs, and the equality type 
of an atom, to introduce atom coverage. 

Definition 5 (Atom Coverage) Consider a BCQ q over a schema TZ, and a set S of TGDs over 
TZ. Let a and b be atoms of body{q), where {ti, . . . ,tn}, for n > 0, is the set of shared variables 
and constants that occur in b. Also, let Gs be the dependency graph ofT,. We say that a covers b 
w.r.t. q and T,, written as a -<'^ b, if for each z € [n]; (i) the term ti occurs also in a, and (ii) if 
ti occurs in a and b at positions Yia,i CLnd ^b,i, respectively, then, there exists an integer k > 2 
and a set of TGDs {ai, . . . ,(Tfc_i} C S, where eq{body{ai)) C eq{a) and, for each j £ [k — 2], 
eq{body{aj^i)) C eq{head{aj)), such that, for each vr € Ilb,i, in Gt, there exists a path VTj^vrjj . . . vrj^., 
where 'Ki^ G Ha^i, TTi,^ = vr, and A((7rj^, , VTj^.^ J) = aj, for each j e [k - 1]. 

Condition (i) ensures that by removing b from q we do not loose any constant, and also all the 
joins between b and the other atoms of body{q), except a, are preserved. Condition (ii) guarantees 
that the atom b is logically implied (w.r.t. S) by the atom o, and therefore can be eliminated. 

Lemma 8 Consider a BGQ q over a schema TZ, and a set S of linear TGDs over TZ. Suppose that 
a -<|. b, where a,b& body{q), and q' is the BGQ obtained from q by eliminating the atom b. Then, 
I \= q iff I \= q' , for each instance I that satisfies S. 

Proof (Sketch). (=>) By hypothesis, there exists a homomorphism h such that h{body{q)) Q /. 
Since, by definition of q' , body{q') C body{q), we immediately get that h[body{q')) C /, which 
implies that I \= q' . 

(<^=) Conversely, there exists a homomorphism h such that h{body{q')) Q I, and thus h[body{q)\ 
{6}) C /. It suffices to show that there exists an extension of h which maps h to /. Since a -<|, 6, it 
is not difficult to verify that there exists an atom c G / such that eg (6) = eg(c), which implies that 
there exists a substitution ji such that ^{h) = c, and also fi is compatible with h. Consequently, 
{h U fi){body{q)) C /, and thus I \= q. □ 

An atom elimination strategy for a BCQ is a permutation of its body-atoms. Given a BCQ 
q and a set S of linear TGDs, the set of atoms of body{q) that cover a G body{q) w.r.t. S, de- 
noted as cover{a,q,'E), is the set {6 | 6 G body{q) and b ~<j^ a}; when q and S are obvious from 
the context, we shall denote the above set as cover (a). By exploiting the cover set of the atoms 
of body{q), we associate to each atom elimination strategy S for q a subset of body{q), denoted 
eliminate{q, S,T,), which is the set of atoms of body[q) that can be safely eliminated (accord- 
ing to S) in order to obtain a logically equivalent query (w.r.t. S) with less atoms in its body. 
Formally, eliminate {q, S,T,) is computed by applying the following procedure; in the sequel, let 
S =[ai,... ,aj, where {oi, . . . , a„} = body{q): 



A := 0; 



foreach i := 1 to n do 

a := S[i]; 

if cover (a) ^ then 
A:= AU {q}; 

foreach b G hody{q) \ ^ do 
cover (b) := cover (b) \ {a}; 
return A. 

By exploiting the fact that the binary relation -<|, is transitive, it is possible to establish the 
uniqueness (w.r.t. the number of the eliminated atoms) of the atom elimination strategy for a BCQ. 
In particular, the following lemma can be shown. 

Lemma 9 Consider a BCQ q over a schema IZ, and a set S of linear TCDs over IZ. Let Si and S2 
be arbitrary elimination strategies for q. It holds that, \eliminate{q, Si,'E)\ = \ eliminate{q, S2,'^)\- 

Since the elimination strategy for a query is unique (w.r.t. the number of the eliminated atoms), 
in the rest of this section we refer to the set of atoms that can be safely eliminated from a query q 
(w.r.t. a set E of linear TGDs) by eliminate {qj'S). 

We are now ready to describe how query elimination works. During the execution of the 
rewriting algorithm TGD-rewrite (see Algorithm [1]) , after the factorization step and the rewriting 
step the so-called elimination step is applied. In particular, the factorized query q' obtained during 
the factorization step is the query eliminate {factorize{q, a) ,Ti) , while the rewritten query obtained 
during the rewriting step is the query eliminate{'yjiu^ii^ad{(7)}iQ[^/^ody{a)]),T,). Moreover, instead 
of adding the given query q in Qrew, we add the eliminated query. In particular, the first line of 
the algorithm is replaced by Qrew '■= {eliminate (q), 1). An example of query elimination follows. 

Example 7 {Query Elimination) Consider the set S of TGDs of Example [U and the BCQ 

q{) ^ p{A,B),r{A,B,C),s{A,A,D). 

a b c 

Based on the Definition [5l it is an easy task to verify that cover{a) = 0, cover{b) = {a} and 
cover{c) = 0. Therefore, the output of the function eliminate {q,J^) is the singleton set {b}. 
Consequently, by applying the elimination step we get the BCQ ^ p{A, B), s{A, A, D). 

As already mentioned, the fact that an atom a covers some atom 6, means that b is logically 
implied (w.r.t. the given set of TGDs) by a. However, as shown by the following example, this fact 
is not also necessary for the implication of b by a. 

Example 8 {Atom Implication) Consider the set S of TGDs of Example El and the BCQ q 

q{) ^ r{A,A,c),p{A,A), 

a b 

where c is a constant of Ac. Observe that a does not cover b since, despite the existence of the paths 
r[l]s[l]p[l] and r[2]s[3]p[2] in the dependency graph of S, eq{body{az)) % eqiheadiaq)). However, 
b is logically implied (w.r.t. S) by a. In particular, for every instance / that satisfies S, if / ^ a, 
which implies that an atom of the from riy^ V, c) exists in /, then due to the TGDs cj2 and (T3 
there exists also an atom p{V, V), and thus I \= b. Note that such cases are identified by the C&B 
algorithm [15]. Nevertheless, as already criticized in Section [2l this requires to pay a price in the 
number of queries in the rewritten query. 



It is not difficult to see that the function eliminate runs in quadratic time in the number of 
atoms of body{q) (by considering the given set of TGDs as fixed). In particular, to compute the 
cover set of each body-atom of q we need to consider all the pairs of atoms of body{q). Note that 
the problem whether a certain atom a covers some other atom b is feasible in constant time since 
the given set of TGDs (and thus its dependency graph) is fixed. 

The following result implies that the rewriting algorithm TGD-rewrite*, obtained from TGD- 
rewrite by applying the additional step of elimination, is still sound and complete. 

Theorem 10 Consider a BCQ q over a schema TZ, a database D for TZ, and a set S of linear 
TGDs over TZ. Then, D ^ TGD-rewrite* {TZ, S, g) iffDUJ:\=q. 

Proof (Sketch). This result follows from the fact that the algorithm TGD-rewrite is sound and 
complete under linear TGDs (see Theorem [6]) and Lemma [H □ 

It is important to clarify that the above result does not hold if we consider arbitrary TGDs (as in 
Theorem [6]). This is because the proof of Lemma [HI which states that atom coverage implies logical 
implication (w.r.t. the given set of TGDs), is based heavily on the linearity of TGDs. Termination 
of TGD-rewrite* follows immediately from the fact that TGD-rewrite terminates under linear TGDs 
(see Theorem [7|). 



7 Implementation and Experimental Setting 

TGD-rewrite (without the additional check described in Subsection 15. 1|) and the query elimina- 
tion technique presented in Section [6] have been implemented in the prototype system Nyaya [36] 
available at 'http : //mais . dia . uniromaS . it/Nyaya', The reasoning and query answering engine 
is based on the IRIS Datalog engine^ extended to support the FO-rewritable fragments of the 
Datalog^ family. In particular, we extended IRIS to natively support existential variables in the 
head without introducing function symbols and to support the constant false as head of a rule 
(used to represent negative constraints). Both IRIS and our extension are implemented in Java. 

Since TGD-rewrite is designed for reasoning over ontologies with large ABoxes, we put ourselves 
in a similar experimental setting such that of [19]. Thus, we use DL-Lite?^ ontologies with a vary- 
ing number of axioms. The queries under consideration are based on canonical examples used in 
the research projects where these ontologies have been developed. VICODI (V) is an ontology of 
European history, and developed in the EU-funded VICODI project^. STOCKEXCHANGE (S) is 
an ontology for representing the domain of financial institutions of the European Union. UNIVER- 
SITY (U) is a DL-Lite-;^ version of the LUBM Benchmark^, developed at Lehigh University, and 
describes the organizational structure of universities. ADOLENA (A) (Abilities and Disabilities 
OntoLogy for ENhancing Accessibility) is an ontology developed for the South African National 
Accessibility Portal, and describes abilities, disabilities and devices. The Path5 (P5) ontology is a 
synthetic ontology encoding graph structures and used to generate an exponential-blowup of the 
size of the rewritten queries. Recall that the transformation of a set of TGDs into an equivalent set 
of single-head TGDs with a single existential variable can introduce auxiliary predicates and rules 
(see Lemmas [1] and [2]) . The ontologies UX, AX and P5X are equivalent ontologies to U, A and P5 
where the auxiliary predicates are considered part of the schema. These ontologies allow to study 
the impact of such transformations on the size of the rewriting. 

We compared our implementation with two other rewriting-based query answering systems for 
FO-rewritable ontologies: QuOnto^'', based on [5] and developed by the University of Rome La 
Sapienza, and Requiem^^, based on [19] and developed by the Knowledge Representation group of 
the University of Oxford. 

'|http: //www . iris-reasoner . org/[ 
http: //www . vicodi . org 
^http: //swat . cse . lehigh . edu/ pro j e ct s /lubm/ [ 
^''http: //www. dis .uniromal . it/quonto/ j 
http: //www. comlab . ox. ac .uk/projects/requiem/home .html[ 



Table 1: Evaluation of Nyaya System. 
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1i 


3,476 


1,953 


1,953 


1,644 


12,006 


7,578 


7,578 


6,263 


9,828 


5,625 


5,625 


4,619 




15 


23,744 


9,766 


9,766 


8,219 


101,652 


47,656 


47,656 


39,531 


96,677 


37,890 


37,890 


31,312 





Since TGD-rewrite, as well as the algorithms presented in [5] and [19], are proven to be sound 
and complete, the most relevant way of judging the quality of the rewriting is the size of the perfect 
rewriting, i.e., the number of CQs in the perfect UCQ rewriting. In addition, we use two additional 
metrics, namely, the length of the rewriting, i.e., the number of atoms in the perfect rewriting, 
and the width, i.e., the number of joins to be performed when the rewritten query is executed. We 
believe these metrics to be more appropriate than the number of symbols in the rewritten query 
used, for example, in [19] . since they allow to establish in a more precise way the cost of executing 
the rewriting on a database system. Tabled] reports the results of our experiments^^ while Table [2] 
shows the queries used in the experiments. We use the symbol "-" to denote those cases where the 
tool did not complete the rewriting within 15 minutes. By QO and RQ we refer to the QuOnto and 
Requiem systems, respectively, while NY and NY* refer to Nyaya with factorisation and Nyaya 
with both factorisation and query elimination, respectively. All the tests have been performed 
on an Intel Core 2 Duo Processor at 2.50 GHz and 4GB of RAM. The OS is Ubuntu Linux 9.10 
carrying a Sun JVM Standard Edition with maximum heap size set at 2GB of RAM. 

As it can be seen, query elimination provides a substantial advantage in terms of the size of the 
perfect rewriting for the real-world ontologies A, U and S. In particular, for the queries denoted as 
Q2 in U and S, our procedure eliminates all the redundant atoms in the input query, and drastically 
reduces the number of queries in the final rewriting. On the other side, query elimination is not 
particularly effective in the synthetic test case P5 and P5X, since these cases have been intentionally 
created in order to generate perfect rewritings of exponential size. 



8 Future Work 

We plan to investigate rewriting and optimization techniques for sticky-join sets of TGDs, and 
alternative forms of rewriting such as positive-existential queries. We also plan to develop improved 



Additional data can be found on the Nyaya's Web site. 



Table 2: Test Queries 



TBox 


Queries 


V 


qi{A) <— Location(A). 

q2{A,B) <— Military. Person{A), hasRole{B, A), related{A, C). 

53(^4, B) <— Time.Dependant.Relation(A) , hasRelationMember(A, B), Event{B). 

q4{A, B) Object{A), hasRole(A, B), Symbol{B). 

qs{A) <— Individual (A), hasRole{A, B), Scientist{B), hasRole{A, C), Discoverer (C), hasRole{A, D), Inventor{D). 


S 


qi{A) <— StockExchangeMember{A) . 

q2{A, B) <- Person(A), hasStock{A, B), Stock(B). 

03(^4, B, C) •<— Finantiallnstrument(A), belonqsToCompany ( A, B), Company(B), hasStock(B , C), Stock(C). 
54(A, B, C) <— Person(A), hasStock{A, B), Stock{B), isListedIn{B , C), StockExchangeList{C). 
95(^4, B, C, D) <— FinantialInstrument{A) , belongsToCompany{A, B), Company{B), hasStock{B , C), Stock(C), 
isListedIn(B, D), StockExchangeList{D) . 


U(X) 


qi{A) <— worksFor{A, B), ajfiliatedOrganizationOf {B , C). 
q2(A, B) <— Person(A), teacherOf(A, B), Course(B). 

qsiA, B, C) <— Student{A), advisor{A, B), Faculty Staff (B), takesCourse{A, C), teacherOf{B, C), Course{C). 

q4{A, B) <— Person{A), worksFor{A, B), Organization{B) . 

qsiA) <— Person{A), worksFor{A, B), University (B), hasAlumnus(B , A). 


A(X) 


qi{A) <— Device{A), assistsWith(A, B). 

q2{A) <— Device{A), assistsWith{A, B), UpperLimbMobility{B). 

q3{A) <- Device(A), assistsWith{A, B), Hear(B), affects{C, B), Autism{C). 

54(A) •(— Device{A), assistsWith{A, B), PhysicalAbility (B) . 

55(^4) <— Device{A), assistsWith{A, B), PhysicalAbility (B) , affects{C, B), Quadriplegia{C) . 


P5(X) 


91 (A) ^ edge{A, B). 

q2(A) edgelA, B), edge{B, C). 

qsiA) <- edge{A, B), edge(B, C), edge{C, D). 

94(A) <- edge{A, B), edge{B, C), edge{C, D), edge{D, E). 

94(A) ^ edge{A, B), edge{B, C), edge{C, D), edge(D, E), edge{E, F). 



techniques for rewriting an ontological query into a non-recursive Datalog program, rather than into 
a union of conjunctive queries (recall the discussion in Section [2D. While the current approaches 
yield exponentially large non-recursive Datalog programs, it is possible to rewrite queries and 
TBoxes into non-recursive Datalog programs whose size is simultaneously polynomial in the query 
and the TBox. This will be dealt in a forthcoming paper. 
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